47 research outputs found

    Penalized Orthogonal-Components Regression for Large p Small n Data

    Full text link
    We propose a penalized orthogonal-components regression (POCRE) for large p small n data. Orthogonal components are sequentially constructed to maximize, upon standardization, their correlation to the response residuals. A new penalization framework, implemented via empirical Bayes thresholding, is presented to effectively identify sparse predictors of each component. POCRE is computationally efficient owing to its sequential construction of leading sparse principal components. In addition, such construction offers other properties such as grouping highly correlated predictors and allowing for collinear or nearly collinear predictors. With multivariate responses, POCRE can construct common components and thus build up latent-variable models for large p small n data.Comment: 12 page

    Case-control genome-wide association study of rheumatoid arthritis from Genetic Analysis Workshop 16 using penalized orthogonal-components regression-linear discriminant analysis

    Get PDF
    Currently, genome-wide association studies (GWAS) are conducted by collecting a massive number of SNPs (i.e., large p) for a relatively small number of individuals (i.e., small n) and associations are made between clinical phenotypes and genetic variation one single-nucleotide polymorphism (SNP) at a time. Univariate association approaches like this ignore the linkage disequilibrium between SNPs in regions of low recombination. This results in a low reliability of candidate gene identification. Here we propose to improve the case-control GWAS approach by implementing linear discriminant analysis (LDA) through a penalized orthogonal-components regression (POCRE), a newly developed variable selection method for large p small n data. The proposed POCRE-LDA method was applied to the Genetic Analysis Workshop 16 case-control data for rheumatoid arthritis (RA). In addition to the two regions on chromosomes 6 and 9 previously associated with RA by GWAS, we identified SNPs on chromosomes 10 and 18 as potential candidates for further investigation

    Long-Term Outcomes of Three-Dimensional High-Dose-Rate Brachytherapy for Locally Recurrent Early T-Stage Nasopharyngeal Carcinoma

    Get PDF
    Background: Brachytherapy (BT) is one of the techniques available for retreatment of patients with locally recurrent nasopharyng eal carcinoma (rNPC). In this study, we evaluated the treatment outcome and late toxicities of three-dimensional high-dose-rate brachytherapy (3D-HDR-BT) for patients with locally rNPC.Materials and Methods: This is a retrospective study involving 36 patients with histologically confirmed rNPC from 2004 to 2011. Of the 36 patients, 17 underwent combined-modality treatment (CMT) consisting of external beam radiotherapy (EBRT) followed by 3D-HDR-BT, while the other 19 underwent 3D-HDR-BT alone. The median dose of EBRT for the CMT group was 60 (range, 50–66) Gy, with an additional median dose of BT of 16 (range, 9–20) Gy. The median dose for the 3D-HDR-BT group was 32 (range, 20–36) Gy. The measured treatment outcomes were the 5- and 10-year locoregional recurrence-free survival (LRFS), disease-free survival (DFS), overall survival (OS), and late toxicities.Results: The median age at recurrence was 44.5 years. The median follow-up period was 70 (range, 6–142) months. The 5-year LRFS, DFS, and OS for the entire patient group were 75.4, 55.6, and 74.3%, respectively, while the 10-year LRFS, DFS, and OS for the entire patient group were 75.4, 44.2, and 53.7%, respectively. The 10-year LRFS in the CMT group was higher than that in the 3D-HDR-BT-alone group (93.8 vs. 58.8%, HR: 7.595, 95%CI: 1.233–61.826, p = 0.025). No grade 4 late radiotherapy-induced toxicities were observed.Conclusions: 3D-HDR-BT achieves favorable clinical outcomes with mild late toxicity in patients with locally rNPC

    Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster

    Get PDF
    Comparison of normalization methods across conditions. Boxplots show the differences in the coefficient of variation across flies in each genotype/sex/environment condition. (PDF 245 kb

    Genome-wide association analysis of GAW17 data using an empirical Bayes variable selection

    Get PDF
    Next-generation sequencing technologies enable us to explore rare functional variants. However, most current statistical techniques are too underpowered to capture signals of rare variants in genome-wide association studies. We propose a supervised coalescing of single-nucleotide polymorphisms to obtain gene-based markers that can stably reveal possible genetic effects related to rare alleles. We use a newly developed empirical Bayes variable selection algorithm to identify associations between studied traits and genetic markers. Using our novel method, we analyzed the three continuous phenotypes in the GAW17 data set across 200 replicates, with intriguing results

    Genome-wide case-control study in GAW17 using coalesced rare variants

    Get PDF
    Genome-wide association studies have successfully identified numerous loci at which common variants influence disease risks or quantitative traits of interest. Despite these successes, the variants identified by these studies have generally explained only a small fraction of the variations in the phenotype. One explanation may be that many rare variants that are not included in the common genotyping platforms may contribute substantially to the genetic variations of the diseases. Next-generation sequencing, which would better allow for the analysis of rare variants, is now becoming available and affordable; however, the presence of a large number of rare variants challenges the statistical endeavor to stably identify these disease-causing genetic variants. We conduct a genome-wide association study of Genetic Analysis Workshop 17 case-control data produced by the next-generation sequencing technique and propose that collapsing rare variants within each genetic region through a supervised dimension reduction algorithm leads to several macrovariants constructed for rare variants within each genetic region. A simultaneous association of the phenotype to all common variants and macrovariants is undertaken using a linear discriminant analysis using the penalized orthogonal-components regression algorithm. The results suggest that the proposed analysis strategy shows promise but needs further development

    Supervised dimension reduction for high-dimensional generalized linear models

    No full text
    Dimensionality reduction has become an increasingly important strategy in highdimensional data analysis in modern statistics. This is largely driven by the need to analyze massive data sets involving ill-posed problems due to high dimensionality and multicollinearity issues. In this thesis, we propose two new regression-based modeling methods for high-dimensional classication problems by implementing dimension reduction idea. In order to deal with the generalized linear model (GLM) with high-dimensional data, we propose a strategy to implement the supervised dimension reduction idea in partial least squares (PLS) to t high-dimensional GLMs. We intend to build up generalized orthogonal-components regression (GOCRE) for GLMs. Unlike the existing methods based on the extension of PLS to categorical data, we sequentially construct orthogonal predictors and each orthogonal predictor is the resultant of convergence construction. The bias correction procedure by Firth (1993) is also applied. In order to simultaneously implement dimension reduction and variable selection ideas in high-dimensional data analysis, we develop Sparse-GOCRE by incorporating a penalized approach into GOCRE framework. Within the sequential construction of components in the framework of GOCRE, a penalized approach is used to identify the sparse predictors for each component. Two dierent penalized strategies are considered, i.e., L1 penalty and empirical Bayes thresholding strategy. Our methods not only provide a solution to the high dimensionality issue but are also able to identify the variables that are highly correlated or share some common coherent patterns. Both simulation studies and real data analysis of gene expression microarray data are presented to illustrate the competitive performance of our methods in comparison with several existing methods
    corecore